Mining Patterns from Structured Data by Beam-Wise Graph-Based Induction

نویسندگان

  • Takashi Matsuda
  • Hiroshi Motoda
  • Tetsuya Yoshida
  • Takashi Washio
چکیده

Graph-Based Induction (GBI) extracts typical patterns from graph data by stepwise pair expansion (pairwise chunking). It is very efficient because of its greedy search strategy but at the same time it suffers from the incompleteness of search. Improvement is made on its search capability without imposing much computational complexity by 1) incorporating a beam search, 2) using a different evaluation function to extract patterns that are more discriminatory than those simply occurring frequently, and 3) adopting canonical labeling to enumerate identical patterns accurately. This new algorithm, now called Beam-wise GBI, B-GBI for short, was tested against the promoter dataset from UCI repository and shown successful in extracting discriminatory substructures. Effect of beam width on the number of discovered attributes and predictive accuracy was evaluated. The best result obtained by this approach was better than the previously best known result. B-GBI was then applied to a real-world data, Hepatitis dataset provided by Chiba University. Our very preliminary results indicate that B-GBI can actually handle graphs with a few thousands nodes and extract discriminatory patterns.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Preliminary Analysis of Hepatitis Data by Beam-wise Graph-Based Induction

Graph-Based Induction (GBI) extracts typical patterns from graph data by stepwise pair expansion (pairwise chunking). It is very efficient because of its greedy search strategy but at the same time it suffers from the incompleteness of search. Improvement is made on its search capability without imposing much computational complexity by 1) incorporating a beam search, 2) using a different evalu...

متن کامل

Mining Discriminative Patterns from Graph Structured Data with Constrained Search

A graph mining method, Chunkingless Graph-Based Induction (Cl-GBI), finds typical patterns that appear in graph structured data by the operation called chunkingless pairwise expansion which generates pseudo-nodes from selected pairs of nodes in the data. Cl-GBI enables to extract overlapping subgraphs, while it requires more time and space complexities. Thus, it happens that Cl-GBI cannot extra...

متن کامل

Constructing a Decision Tree for Graph-Structured Data and its Applications

A machine learning technique called Graph-Based Induction (GBI) efficiently extracts typical patterns from graph-structured data by stepwise pair expansion (pairwise chunking). It is very efficient because of its greedy search. Meanwhile, a decision tree is an effective means of data classification from which rules that are easy to understand can be obtained. However, a decision tree could not ...

متن کامل

Constructing a Decision Tree for Graph Structured Data

Decision tree Graph-Based Induction (DT-GBI) is proposed that constructs a decision tree for graph structured data. Substructures (patterns) are extracted at each node of a decision tree by stepwise pair expansion (pairwise chunking) in GBI to be used as attributes for testing. Since attributes (features) are constructed while a classifier is being constructed, DT-GBI can be conceived as a meth...

متن کامل

Constructing Decision Trees for Graph-Structured Data by Chunkingless Graph-Based Induction

A decision tree is an effective means of data classification from which one can obtain rules that are easy to understand. However, decision trees cannot be conventionally constructed for data which are not explicitly expressed with attribute-value pairs such as graph-structured data. We have proposed a novel algorithm, named Chunkingless Graph-Based Induction (Cl-GBI), for extracting typical pa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002